-
Notifications
You must be signed in to change notification settings - Fork 177
chore: Modernize the MongoDB Atlas Mixin #1544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
chore: Modernize the MongoDB Atlas Mixin #1544
Conversation
| - alert: MongoDBAtlasElectionTimeouts | ||
| annotations: | ||
| description: The number of elections being called due to the primary node timing out in replica set {{$labels.rs_m}} in cluster {{$labels.cl_name}} is {{printf "%.0f" $value}} which is above the threshold of 10. | ||
| description: The number of elections being called due to the primary node timing out in replica set {{$labels.rs_nm}} in cluster {{$labels.cl_name}} is {{printf "%.0f" $value}} which is above the threshold of 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified a typo here
Dasomeone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to leave it here as it's end of day, but first pass review sort of done.
Generally, layout is A+ and I'm perfectly happy with it.
Couple suggestions for improvements in terms of legend tabels and filtering, and I have yet to do a pass on the usage of common-lib so no comments there yet
| dashboardTimezone: 'default', | ||
| dashboardRefresh: '1m', | ||
| // Basic filtering - MongoDB Atlas uses job and cl_name (cluster name) as primary filters | ||
| filteringSelector: 'job="integrations/mongodb-atlas"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've recently talked about, we can vendor latest logs-lib and unset this for the public mixin
| alertsDeadlocks: 10, // count | ||
| alertsSlowNetworkRequests: 10, // count | ||
| alertsHighDiskUsage: 90, // % | ||
| alertsSlowHardwareIO: 3, // seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I commented on a previous PR, we could consider having the units be more tightly coupled with the native metric unit, e.g. milliseconds in order to simplify the query
| { | ||
| alert: 'MongoDBAtlasSlowHardwareIO', | ||
| expr: ||| | ||
| (sum without (disk_name) (increase(hardware_disk_metrics_read_time_milliseconds[5m])) + sum without (disk_name) (increase(hardware_disk_metrics_write_time_milliseconds[5m]))) / 1000 > %(alertsSlowHardwareIO)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I commented on a previous PR, we could consider having the units be more tightly coupled with the native metric unit, e.g. milliseconds in order to simplify the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| hardwareIO: | ||
| commonlib.panels.generic.timeSeries.base.new('Hardware I/O', targets=[ | ||
| signals.cluster.diskReadCount.asTarget(), | ||
| signals.cluster.diskWriteCount.asTarget(), | ||
| ]) | ||
| + g.panel.timeSeries.panelOptions.withDescription("The number of read and write I/O's processed.") | ||
| + g.panel.timeSeries.standardOptions.withUnit('iops') | ||
| + g.panel.timeSeries.options.legend.withPlacement('right') | ||
| + g.panel.timeSeries.options.legend.withAsTable(true), | ||
|
|
||
| hardwareIOWaitTime: | ||
| commonlib.panels.generic.timeSeries.base.new('Hardware I/O wait time / $__interval', targets=[ | ||
| signals.cluster.diskReadTime.asTarget() | ||
| + g.query.prometheus.withInterval('2m'), | ||
| signals.cluster.diskWriteTime.asTarget() | ||
| + g.query.prometheus.withInterval('2m'), | ||
| ]) | ||
| + g.panel.timeSeries.panelOptions.withDescription('The amount of time spent waiting for I/O requests.') | ||
| + g.panel.timeSeries.standardOptions.withUnit('ms') | ||
| + g.panel.timeSeries.options.tooltip.withSort('desc') | ||
| + g.panel.timeSeries.options.legend.withPlacement('right') | ||
| + g.panel.timeSeries.options.legend.withAsTable(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these two panels, I think it'd be beneficial if we make use of the table legend options to add last*, min, mean, and max columns here. Should be available via standard options, can't remember off the top of my head, but Gabriel just used it in the postgres mixin last week
| + g.panel.timeSeries.standardOptions.withUnit('reqps') | ||
| + g.panel.timeSeries.options.tooltip.withSort('desc'), | ||
|
|
||
| networkThroughput: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for last*, and at least mean as additional data columns for a quick overview on the side
| // | ||
| // Elections panels | ||
| // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| + g.panel.timeSeries.standardOptions.withUnit('reqps') | ||
| + g.panel.timeSeries.options.tooltip.withSort('desc'), | ||
|
|
||
| slowNetworkRequestsPerformance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 panel here and general networkThroughputPerformance as well that may need some filtering as it is and will keep affecting the y axis scaling
| + g.panel.timeSeries.options.legend.withPlacement('right') | ||
| + g.panel.timeSeries.options.legend.withAsTable(true), | ||
|
|
||
| hardwareIOWaitTime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 filtering, I'll stop commenting on them now
Some of the screenshots are missing data mostly due to me not setting up sharding, but queries/functionality should be pretty similar to the original.
MongoDB Atlas cluster overview

Paginated the tables 🚀




MongoDB Atlas elections overview


MongoDB Atlas operations overview



MongoDB Atlas performance overview



MongoDB Atlas sharding overview


